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Abstract: We propose a class of transformation hazard models for right- 
censored failure time data. It includes the proportional hazards model (Cox) 
and the additive hazards model (Lin and Ying) as special cases. Due to the re- 
quirement of a nonnegative hazard function, multidimensional parameter con- 
straints must be imposed in the model formulation. In the Bayesian paradigm, 
the nonlinear parameter constraint introduces many new computational chal- 
lenges. We propose a prior through a conditional-marginal specification, in 
which the conditional distribution is univariate, and absorbs all of the nonlin- 
ear parameter constraints. The marginal part of the prior specification is free 
of any constraints. This class of prior distributions allows us to easily com- 
pute the full conditionals needed for Gibbs sampling, and hence implement 
the Markov chain Monte Carlo algorithm in a relatively straightforward fash- 
ion. Model comparison is based on the conditional predictive ordinate and the 
deviance information criterion. This new class of models is illustrated with a 
simulation study and a real dataset from a melanoma clinical trial. 



1. Introduction 

In survival analysis and clinical trials, the Cox [10] proportional hazards model has 
been routinely used. For a subject with a possibly time-dependent covariate vector 
Z(i), the proportional hazards model is given by, 

(1.1) A(i|Z)=A (t)exp{/3'Z(t)}, 

where Ao (t) is the unknown baseline hazard function and (3 is the px 1 parameter 
vector of interest. Cox [11] proposed to estimate (3 under model (jl.ip by maxi- 
mizing the partial likelihood function and its large sample theory was established 
by Andersen and Gill [1]. However, the proportionality of hazards might not be a 
valid modeling assumption in many situations. For example, the true relationship 
between hazards could be parallel, which leads to the additive hazards model (Lin 
and Ying [24]), 

(1.2) A(t|Z)=A (t)+/3'Z(i). 

As opposed to the hazard ratio yielded in (jl.ip . the hazard difference can be ob- 
tained from (|1.2p , which formulates a direct association between the expected num- 
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ber of events or death occurrences and risk exposures. O'Neill [28] showed that use 
of the Cox model can result in serious bias when the additive hazards model is 
correct. Both the multiplicative and additive hazards models have sound biological 
motivations and solid statistical bases. 

Lin and Ying [25], Martinussen and Scheike [26] and Scheike and Zhang [30] 
proposed general additive-multiplicative hazards models in which some covariates 
impose the proportional hazards structure and others induce an additive effect on 
the hazards. In contrast, we link the additive and multiplicative hazards models in a 
completely different fashion. Through a simple transformation, we construct a class 
of hazard-based regression models that includes those two commonly used modeling 
schemes. In the usual linear regression model, the Box-Cox transformation [4] may 
be applied to the response variable, 



where lim 7 ^o(^ 7 — l)/7 = log(Y). This transformation has been used in survival 
analysis as well [2, 3, 5, 7, 13, 32]. Brcslow and Storer [7] and Barlow [3] applied 
this family of power transformations to the covariate structure to model the relative 
risk R(Z), 



where i£(Z) is the ratio of the incidence rate at one level of the risk factor to that 
at another level. Aranda-Ordaz [2] and Brcslow [5] proposed a compromise between 
these two special cases, 7 = or 1, while their focus was only on grouped survival 
data by analyzing sequences of contingency tables. Sakia [29] gave an excellent 
review on this power transformation. 

The proportional and additive hazards models may be viewed as two extremes 
of a family of regression models. On a basis that is very different from the available 
methods in the literature, we propose a class of regression models for survival data 
by imposing the Box-Cox transformation on both the baseline hazard Ao (t) and the 
hazard X(t\Z). This family of transformation models is very general, which includes 
the Cox proportional hazards model and the additive hazards model as special cases. 
By adding a transformation parameter, the proposed modeling structure allows a 
broad class of hazard patterns. In many applications where the hazards are neither 
proportional nor parallel, our proposed transformation model provides a unified 
and flexible methodology for analyzing survival data. 

The rest of this article is organized as follows. In Section 2.1, we introduce 
notation and a class of regression models based on the Box-Cox transformed haz- 
ards. In Section 2.2, we derive the likelihood function for the proposed model using 
piecewise constant hazards. In Section 2.3, wc propose a prior specification scheme 
incorporating the parameter constraints within the Bayesian paradigm. In Section 
3, we derive the full conditional distributions needed for Gibbs sampling. In Sec- 
tion 4, we introduce model selection methods based on the conditional predictive 
ordinate (CPO) in Geisser [14] and the deviance information criterion (DIC) pro- 
posed by Spiegelhalter et al. [31]. We illustrate the proposed methods with data 
from a melanoma clinical trial, and examine the model using a simulation study in 
Section 5. We give a brief discussion in Section 6. 



(1.3) 




7^0 
7 = 




7^0 
7 = 0, 
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2. Transformation hazard models 
2.1. A new class of models 

For n independent subjects, let Ti (i = 1, . . . , n) be the failure time for subject i and 
Zj(£) be the corresponding p x 1 covariate vector. Let Ci be the censoring variable 
and define Yi — min(Ti,Ci). The censoring indicator is Vi = /(Ti < Ci), where 
/(•) is the indicator function. Assume that Ti and Ci are independent conditional 
on 7ii{t), and that the triplets {(Ti, Ci, Z,;(i)), i — 1, ...,n} are independent and 
identically distributed. 

For right-censored failure time data, we propose a class of Box-Cox transforma- 
tion hazard models, 

(2.1) <f>{\(t\Zi)} = cp{\ (t)} + 0%(t), 

where (/>(•) is a known link function given by l|1.3j) . We take 7 as fixed throughout 
our development for the following reasons. First, our main goal is to model selection 
on 7, by fitting separate models for each value of 7 and evaluating them through a 
model selection criterion. Once the best 7 is chosen according to a model selection 
criterion, posterior inference regarding (/3, A) is then based on that 7. Second, in 
real data settings, there is typically very little information contained in the data 
to estimate 7 directly. Third, posterior estimation of 7 is computationally difficult 
and often numerically unstable due to the constraint (|2.3j) as well as its weak 
identifiability property. To understand how the hazard varies with respect to 7, we 
carried out a numerical study as follows. We assume that Xo(t) = t/3 in one case, 
and Xo(t) = t 2 /5 in another case. A single covariate Z takes a value of or 1 with 
probability .5, and 7 = (0, .25, .5, .75, 1). Model (|2.ip can be written as 

A(t|Z J ) = {Ao(i)^+7/3'Z J (t)} 1/ ^. 

As shown in Figure 1, there is a broad family of models for < 7 < 1. Our 
primary interest for 7 lies in [0, 1], which covers the two popular cases and a family 
of intermediate modeling structures between the proportional (7 = 0) and the 
additive (7 = 1) hazards models. 

Misspecified models may lead to severe bias and wrong statistical inference. In 
many applications where neither the proportional nor the parallel hazards assump- 
tion holds, one can apply (|2.ip to the data with a set of prespecified 7's, and choose 
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Fig 1. The relationships between X (t) and \(t\Z) = {A (t) 7 + jZ} 1 ^ , with Z = 0,1. Left: 
A (t) = t/3; right: X (t) = t 2 /5. 
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the best fitting model according to a suitable model selection criterion. The need 
for the general class of models in (|2.1[) can be demonstrated by the El 690 data 
from the Eastern Cooperative Oncology Group (ECOG) phase III melanoma clini- 
cal trial (Kirkwood et al. [23]). The objective of this trial was to compare high-dose 
interferon to observation (control). Relapse- free survival was a primary outcome 
variable, which was defined as the time from randomization to progression of tu- 
mor or death. As shown in Section 5, the best choice of 7 in the E1690 data is 
indeed neither nor 1, but 7 = .5. 

Due to the extra parameter 7, (3 is intertwined with Ao(i) in (|2.1|) . As a re- 
sult, the model is very different from either the proportional hazards model, which 
can be solved through the partial likelihood procedure, or the additive hazards 
model, where the estimating equation can be constructed based on martingale inte- 
grals. Here, we propose to conduct inference with this transformation model using 
a Bayesian approach. 

2.2. Likelihood function 

The piecewise exponential model is chosen for Ao(i). This is a flexible and commonly 
used modeling scheme and usually serves as a benchmark for the comparison of 
parametric and nonparamctric approaches (Ibrahim, Chen and Sinha [21]). Other 
nonparametric Bayesian methods for modeling Ao (t) are available in the literature 
[20, 22, 27]. Let ?/, be the observed time for the iih subject, y = (yi, . . . , y n )' , 
v = (yi, . . . , v n )', and Z(t) = (Zi(t), . . . , Z n (t))'. Let J denote the number of 
partitions of the time axis, i.e. < s\ < ■ ■ ■ < sj, s,j > yi for i = l,...,n, 
and that \o(y) = Aj for y E (sj-\,Sj], j = I,..., J. When J = 1, the model 
reduces to a parametric exponential model. By increasing J, the piecewise constant 
hazard formulation can essentially model any shape of the underlying hazard. The 
usual way to partition the time axis is to obtain an approximately equal number 
of failures in each interval, and to guarantee that each time interval contains at 
least one failure. Define Sij = 1 if the ith subject fails or is censored in the jth 
interval, and otherwise. Let D = (n,y,Z(t),u) denote the observed data, and 
A = (Ai, . . . , Aj)'. For ease of exposition and computation, let Zj = Z,(t), then the 
likelihood function is 

n ./ 

L(f3,\\D) = HH(X] + ll3%) 5i ^ h 
(2 2) i=lj=1 

x e -^{(Aj+7/3'Z0 1 /^(y l - Sj _ 1 )+X; 3 B= 1 i (A^+ 7/ 3'Z 1 ) 1 /-'( Sg - ;i3 - 1 )} 

2.3. Prior distributions 

The joint prior distribution of (/3, A) needs to accommodate the nonnegativity con- 
straint for the hazard function, that is, 

(2.3) AT + 7 /3 / Z i >0 (i = 1, . . . ,n; j = 1, . . . , J). 

Constrained parameter problems typically make Bayesian computation and anal- 
ysis quite complicated [8, 9, 16]. For example, the order constraint on a set of 
parameters (e.g., 9\ < 82 < ■ ■ ■ ) is very common in Bayesian hierarchical models. 
In these settings, closed form expressions for the normalizing constants in the full 
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conditional distributions are typically available. However, for our model, this is not 
the case; the normalizing constant involves a complicated intractable integral. The 
nonnegativity of the hazard constraint is very different from the usual order con- 
straints. If the hazard is negative, the likelihood function and the posterior density 
are not well defined. One way to proceed with this nonlinear constraint is to specify 
an appropriately truncated joint prior distribution for (/3, A), such as a truncated 
multivariate normal prior N(fi, X) for (/3|A) to satisfy this constraint. This would 
lead to a prior distribution of the form 

tt(/3, A) = 7r(/3|A)7r(A)7(Aj + 7 /3'Z, > 0, i = 1, . . . ,n; j = 1, . . . , J). 

Following this route, we would need to analytically compute the normalizing con- 
stant, 

c(A)= /-.. / exp\-h(3- t iys- 1 ((3- t i)\d(3 1 ---d(3 p 

J JAj+7/3'Zi>0forall»J I Z J 

to construct the full conditional distribution of A. However, c(A) involves a p- 
dimcnsional integral on a complex nonlinear constrained parameter space, which 
cannot be obtained in a closed form. Such a prior would lead to intractable full 
conditionals, therefore making Gibbs sampling essentially impossible. 

To circumvent the multivariate constrained parameter problem, we reduce our 
prior specification to a one-dimensional truncated distribution, and thus the nor- 
malizing constant can be obtained in a closed form. Without loss of generality, 
we assume that all the covariates are positive. Let %i(-k) denote the covariate Zj 
with the kth component Zn~ deleted, and let P(- k ) denote the (p — l)-dimensional 
parameter vector with f3 k removed, and define 

h f\ a i \ ■ f A I+7/3(- fc )Z i( - fc ) i 
K\ X 3>P(-k)> z i) = it 2 >■ 

We propose a joint prior for (/3, A) of the form 

(2.4) tt(/3, A)=7T(/? fc |/3 ( _ fc) , A)/ (/? fe >-/i 7 (Aj,/3(- fe ), Z 4 )) 7r(/3 ( _ fe) , A). 

We see that [3k and (/3(_ fe \,A) arc not independent a priori due to the nonlinear 
parameter constraint. This joint prior specification only involves one parameter (3^ 
in the constraints and makes all the other parameters (/3(_fc) , A) free of constraints. 

Let $(•) denote the cumulative distribution function of the standard normal 
distribution. Specifically, we take (/3fc|/3(_fc), A) to have a truncated normal distri- 
bution, 

exp{— 

(2.5) n(p k |/3 ( _ fc) , A) = — I (p k > -h^Xj , /3 ( _ fc) , Z,)) , 
where the normalizing constant depends on and A, given by 

(2.6) c(/3 { _ k) ,X) = V2^a k 

Thus, we need only to constrain one parameter (3 k to guarantee the nonnegativity 
of the hazard function and allow the other parameters, (/3(_ fe ), A), to be free. 



I — $ — 



/i 7 (Aj,P(_ fe ), Zj, 
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Although not required for the development, we can take fi(-k) and A to be 
independent a priori in (|2.4p . 7r(/3/_^,A) = 7r(/3/_^)7r(A). In addition, we can 
specify a normal prior distribution for each component of f3(_ k y We assume that 
the components of A are independent a priori, and each A.,- has a Gamma(a, £) 
distribution. 

3. Gibbs sampling 

For < 7 < 1, it can be shown that the full conditionals of (/3i, . . . , f3 p ) are 
log-concave, in which case we only need to use the adaptive rejection sampling 
(ARS) algorithm proposed by Gilks and Wild [19]. Due to the non- log-concavity 
of the full conditionals of the Xj 's, a Metropolis step is required within the Gibbs 
steps, for details see Gilks, Best and Tan [18]. For each Gibbs sampling step, the 
support for the parameter to be sampled is set to satisfy the constraint (|2.3[) . 
such that the likelihood function is well defined within the sampling range. For 
i = 1, . . . , n; j = 1, . . . , J; k = 1, . . . ,p, the following inequalities need to be satisfied, 

Pk > -M^/V*)' 2 *); X i > -min^'Z,) 177 ^}. 

Suppose that the A:th component of /3 has a truncated normal prior as given in 
()2.5|) . and all other parameters are left free. The full conditionals of the parameters 
are given as follows: 

7r(/3 fe |/3 ( _ fe)) A,D) cx L(p,\\D)K(p k \p ( _ k) ,\) 
7r(A|/3 ( _ ,A,D) cx L(/3,A|-D)7r(A)/c(/3 ( _ fc) ,A) 
""(A/IA X(-j),D) cx L(/3,A|-D) 7 r(A J -)/c(/3 ( _ fe) ,A) 

where 

7r(A) cx exp{-/3 2 /(2a 2 )}, I ± k, I = 1, . . . ,p, 
cx A" -1 exp(— £Aj), j = l,...,J. 

These full conditionals have nice tractable structures, since c(/3(_ fc ), A) has a closed 
form with our proposed prior specification. Posterior estimation is very robust with 
respect to the conditioning scheme (the choice of k) in (|2.4|) . 

4. Model assessment 

It is crucial to compare a class of competing models for a given dataset and select 
the model that best fits the data. After fitting the proposed models for a set of pre- 
specified 7's, we compute the CPO and DIC statistics, which are the two commonly 
used measures of model adequacy [14, 15, 12, 31]. 

We first introduce the CPO as follows. Let Z^ - ^ denote the (n-l)xp covariate 
matrix with the ith row deleted, let y^ -1 -* denote the (n — 1) x 1 response vector 
with yi deleted, and iA -8 ) is defined similarly. The resulting data with the ith case 
deleted can be written as = {{n - 1), y^), Z^ 1 ' , u^}. Let /(yi|Z,-, (3, A) 

denote the density function of yi, and let 7r(/3, X\D^^) denote the posterior density 
of (f3, A) given D^K Then, CPO; is the marginal posterior predictive density of 
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Ui given l \ which can be written as 

CP0 4 = /(i^Z,, £><"*>) 

= J J f( yi \Zi,0,\)n(p,\\D(-Q)df3d\ 

For the proposed transformation model, a Monte Carlo approximation of CPO^ is 
given by, 

era f'f ' r' 

where 

J 

x exp [-<M(Aj >[m] + 10 [m] Zi) lh {yi - sj-i) 
+ E( A Im+^M Z «) 1/7 (^-^-i)} 

3=1 

Note that M is the number of Gibbs samples after burn-in, and A[ m ] = (A 1; [ TO ], . . . , 
Aj,[ TO ])' an( i /3[m] are tbe samples of the mth Gibbs iteration. A common summary 
statistic based on the CPO^'s is B = Ym=i l°g(CPOj), which is often called the 
logarithm of the pseudo Bayes factor. A larger value of B indicates a better fit of 
a model. 

Another model assessment criterion is the DIC (Spiegelhalter et al. [31]), defined 

as 

DIC = 2Dev(/3,A) - Dev(/3, A), 



where Dev(/3, A) = — 2 logL(/3, \\D) is the deviance, and Dev(/3, A), (3 and A are 
the corresponding posterior means. Specifically, in our proposed model, 

4 M 

DIC = ~m £ ^ g L{l3 {m] ,X [m] \D) + 21ogL(P,\\D). 

m—1 

The smaller the DIC value, the better the fit of the model. 

5. Numerical studies 
5. 1 . Application 

As an illustration, we applied the transformation models to the El 690 data. There 
were a total of n = 427 patients on these combined treatment arms. The covari- 
ates in this analysis were treatment (high-dose interferon or observation), age (a 
continuous variable which ranged from 19.13 to 78.05 with mean 47.93 years), sex 
(male or female) and nodal category (1 if there were no positive nodes, or 2 oth- 
erwise). Figure 2 shows the estimated cumulative hazard curves for the interferon 
and observation groups based on the Nelson- Aalen estimator. 
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2 4 6 
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Fig 2. The estimated cumulative hazard curves for the two arms in El 690 



Table 1 

The B/DIC statistics with respect to 7 and J in the El 690 data 



J 




1 


5 


10 





-567.43/1129.19 


-528.36/1051.84 


-555.46/1105.48 


.25 


-567.96/1131.71 


-523.74/1045.68 


-534.57/1066.86 


7 -5 


-568.47/1133.72 


-522.55/1043.64 


-529.13/1056.44 


.75 


-568.89/1135.16 


-522.66/1043.86 


-527.47/1053.17 


1 


-569.46/1136.54 


-523.04/1044.84 


-526.80/1052.06 



We constrained the regression coefficient for treatment, (3i, to have the trun- 
cated normal prior. We prespecified 7 = (0, .25, .5, .75, i) and took the priors for 
(3 = (/3i, /?2, /?3, (3a)' and A — (Ai, . . . , A,/)' to be noninformative. For example, 
(/?x|^ij9(_i)) was assigned the truncated 7V(0, 10,000) prior as defined in (|2.5[) . 
(Pi, I — 2,3,4) were taken to have independent iV(0, 10,000) prior distributions, 
and Xj ~ Gamma(2, .01), and independent for j — 1, . . . , J. To allow for a fair 
comparison between different models using different 7's, we used the same nonin- 
formative priors across all the targeted models. 

The shape of the baseline hazard function is controlled by J. The finer the 
partition of the time axis, the more general the pattern of the hazard function that 
is captured. However, by increasing J, we introduce more unknown parameters 
(the Aj's). For the proposed transformation model, 7 also directly affects the shape 
of the hazard function, and specifically, there is much interplay between J and 7 
in controlling the shape of the hazard, and in some sense 7 and J are somewhat 
confounded. Thus when searching for the best fitting model, we must find suitable 
J and 7 simultaneously. Similar to a grid search, we set J — (1,5, 10), and located 
the point (J, 7) that yielded the largest B statistic and the smallest DIC. 

After a burn-in of 2,000 samples and thinned by 5 iterations, the posterior com- 
putations were based on 10,000 Gibbs samples. The B and DIC statistics for model 
selection are summarized in Table 1. The two model selection criteria are quite con- 
sistent with each other, and both lead to the same best model with J = 5 and 7 = .5. 
Table 2 summarizes the posterior means, standard deviations and the 95% highest 
posterior density (HPD) intervals for (3 using J = (1, 5, 10) and 7 = (0, .5, 1). For 
the best model (with J = 5 and 7 = .5), we see that the treatment effect has a 95% 
HPD interval that does not include 0, confirming that treatment with high-dose 
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Table 2 



Posterior means 


, standard deviations, and 95% HPD intervals for the El 690 data 


J 7 


Covariate 


Mean 


SD 


95% HPD Interval 


1 


Treatment 


-.2888 


.1299 


(-.5369, -.0310) 




Age 


.0117 


.0050 


(.0016, .0214) 




Sex 


-.3479 


.1375 


(-.6372, -.0962) 




Nodal Category 


.5267 


.1541 


(.2339, .8346) 


.5 


Treatment 


-.1398 


.0626 


(-.2588, -.0111) 




Age 


.0056 


.0024 


(.0011, .0103) 




Sex 


-.1464 


.0644 


(-.2791, -.0254) 




Nodal Category 


.2179 


.0688 


(.0835, .3529) 


1 


Treatment 


-.0655 


.0299 


(-.1245, -.0078) 




Age 


.0026 


.0011 


(.0004, .0047) 




Sex 


-.0593 


.0293 


(-.1155, -.0007) 




Nodal Category 


.0863 


.0296 


(.0304, .1471) 


5 


Treatment 


-.4865 


.1295 


(-.7492, -.2408) 




Age 


-.0036 


.0050 


(-.0133, .0061) 




Sex 


-.4423 


.1421 


(-.7196, -.1684) 




Nodal Category 


.1461 


.1448 


(-.1307, .4298) 


.5 


Treatment 


-.1835 


.0626 


(-.3066, -.0604) 




Age 


.0017 


.0024 


(-.0030, .0064) 




Sex 


-.1557 


.0655 


(-.2853, -.0310) 




Nodal Category 


.1141 


.0685 


(-.0179, .2510) 


1 


Treatment 


-.0525 


.0274 


(-.1058, .0007) 




Age 


.0011 


.0009 


(-.0006, .0027) 




Sex 


-.0334 


.0249 


(-.0818, .0148) 




Nodal Category 


.0265 


.0224 


(-.0169, .0705) 


10 


Treatment 


-.7238 


.1260 


(-.9639, -.4710) 




Age 


-.0175 


.0047 


(-.0269, -.0084) 




Sex 


-.6368 


.1439 


(-.9158, -.3544) 




Nodal Category 


.1685 


.1302 


(-.4184, .0859) 


.5 


Treatment 


-.2272 


.0629 


(-.3581, -.1094) 




Age 


-.0009 


.0023 


(-.0056, .0035) 




Sex 


-.1791 


.0649 


(-.3094, -.0546) 




Nodal Category 


.0534 


.0670 


(-.0814, .1798) 


1 


Treatment 


-.0610 


.0274 


(-.1142, -.0070) 




Age 


.0006 


.0008 


(-.0010, .0021) 




Sex 


-.0334 


.0256 


(-.0850, .0155) 




Nodal Category 


.0107 


.0225 


(-.0325, .0569) 



interferon indeed substantially reduced the risk of melanoma relapse compared to 
observation. 

In Figure 3, we present the estimated hazards for the interferon and observation 
arms for 7 = 0, .5 and 1 using J = 5. It is important to note that, when 7 = .5, the 
hazard ratio increases over time while the hazard difference decreases. 

The proportional hazards model yields a hazard ratio of 1.63, the additive haz- 
ards model gives a hazard difference of .05, and the model with 7 = .5 shows 
hazard ratios of 1.27, 1.36 and 1.61, and hazard differences of .14, .11 and .07 at .5, 
1 and 3 years, respectively. This interesting feature between the hazards cannot be 
captured through a conventional modeling structure. An opposite phenomenon in 
which the difference of the hazards increases in t whereas their ratio decreases, was 
noted in the British doctors study (Breslow and Day [6], p. 112, pp. 336-338), which 
examined the effects of cigarette smoking on mortality. We also computed the half 
year and one year posterior predictive survival probabilities for a 48 years old male 
patient under the high-dose interferon treatment with one or more positive nodes. 
When 7 = .5, the .5 year posterior predictive survival probabilities are .8578, .7686 
and .7804 for J = 1, 5 and 10; the 1 year survival probabilities are .7357, .6043 and 
.6240, respectively. When J is large enough, the posterior inference becomes stable. 
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Cox proportional hazards model (gamma-0) 



Additive hazards model (gamma-1) 




Time (years) 



Fig 3. Estimated hazards under models with 7 = 0, .5 and 1, for male subjects at age= 47.93 
years and with one or more positive nodes, using J = 5. 



Table 3 

Sensitivity analysis with having a truncated normal prior using J = 5 and 7 = .5 



Truncated Covariate 


Regression Coefficient 


Mean 


SD 


95% HPD Interval 


Age 


Treatment 


-.1862 


.0633 


(-.3122, -.0627) 




Age 


.0016 


.0024 


(-.0032, .0063) 




Sex 


-.1551 


.0665 


(-.2802, -.0187) 




Nodal Category 


.1132 


.0697 


(-.0229, .2511) 


Sex 


Treatment 


-.1883 


.0634 


(-.3107, -.0592) 




Age 


.0017 


.0024 


(-.0032, .0063) 




Sex 


-.1572 


.0651 


(-.2801, -.0296) 




Nodal Category 


.1131 


.0672 


(-.0165, .2448) 


Nodal Category 


Treatment 


-.1850 


.0633 


(-.3037, -.0566) 




Age 


.0017 


.0024 


(-.0030, .0062) 




Sex 


-.1519 


.0662 


(-.2819, -.0236) 




Nodal Category 


.1124 


.0679 


(-.0223, .2416) 



We examined MCMC convergence based on the method proposed by Geweke 
[17]. The Markov chains mixed well and converged fast. We conducted a sensitivity 
analysis on the choice of the conditioning scheme in the prior (12. 5|) by choosing 
the regression coefficient of each covariate to have a truncated normal prior. The 
results in Table 3 show the robustness of the model to the choice of the constrained 
parameter in the prior specification. This demonstrates the appealing feature of 
the proposed prior specification, which thus facilitates an attractive computational 
procedure. 

5. 2. Simulation 

We conducted a simulation study to examine properties of the proposed model. The 
failure times were generated from model (12. ip with 7 = .5. We assumed a constant 
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Table 4 

Simulation results based on 500 replications, 
with the true values /3i = .7 and ft = 1 



n 


c% 


Mean (ft) 


SD (ft) 


Mean (ft) 


SD (ft) 


300 





.7705 


.2177 


1.0556 


.4049 




25 


.7430 


.2315 


1.0542 


.4534 


500 





.7424 


.1989 


1.0483 


.3486 




25 


.7510 


.2084 


1.0503 


.3781 


1000 





.7273 


.1784 


1.0412 


.2920 




25 


.7394 


.1869 


1.0401 


.3100 



baseline hazard, i.e., Ao(t) = .5, and two covariates were generated independently: 
Z\ ~ A(5, 1) and Zi is a binary random variable taking a value of 1 or 2 with 
probability .5. The corresponding regression parameters were fix = .7 and /?2 = 1. 
The censoring times were simulated from a uniform distribution to achieve approx- 
imately a 25% censoring rate. The sample sizes were n = 300, 500 and 1,000, and 
we replicated 500 simulations for each configuration. 

Noninformative prior distributions were specified for the unknown parameters as 
in the E1690 example. For each Markov chain, we took a burn-in of 200 samples and 
the posterior estimates were based on 5,000 Gibbs samples. The posterior means 
and standard deviations are summarized in Table 4, which show the convergence 
of the posterior means of the parameters to the true values. As the sample size 
increases, the posterior means of /3i and (3% approach their true values and the 
corresponding standard deviations decrease. As the censoring rate increases, the 
posterior standard deviation also increases. 

6. Discussion 

We have proposed a class of survival models based on the Box-Cox transformed 
hazard functions. This class of transformation models makes hazard-based regres- 
sion more flexible, general, and versatile than other methods, and opens a wide 
family of relationships between the hazards. Due to the complexity of the model, 
we have proposed a joint prior specification scheme by absorbing the non-linear 
constraint into one parameter while leaving all the other parameters free of con- 
straints. This prior specification is quite general and can be applied to a much 
broader class of constrained parameter problems arising from regression models. It 
is usually difficult to interpret the parameters in the proposed model except when 
7 = or 1. However, if the primary aim is for prediction of survival, the best fitting 
Box-Cox transformation model could be useful. 
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